RP-Filter: A Path-Based Triple Filtering Method for Efficient SPARQL Query Processing
نویسندگان
چکیده
With the rapid increase of RDF data, the SPARQL query processing has received much attention. Currently, most RDF databases store RDF data in a relational table called triple table and carry out several join operations on the triple tables for SPARQL query processing. However, the execution plans with many joins might be inefficient due to a large amount of intermediate data being passed between join operations. In this paper, we propose a triple filtering method called RP-Filter to reduce the amount of intermediate data. RP-Filter exploits the path information in the query graphs and filters the triples which would not be included in final results in advance of joins. We also suggest an efficient relational operator RFLT which filters triples by means of RP-Filter. Experimental results on synthetic and real-life RDF data show that RP-Filter can reduce the intermediate results effectively and accelerate the SPARQL query processing.
منابع مشابه
SwarmGuide: Towards Multiple-Query Optimization in Graph Databases
Preliminaries. A graph database G is a finite, directed, edge-labeled, multigraph defined by G = 〈N,Σ,E〉, where N is a finite set of nodes (vertices), Σ is a set of labels, E is a set of directed, labeled edges, and E ⊆ N ×Σ×N . A path p in G is defined as a sequence of n0a0n1 · · · nk−1ak−1nk where ni ∈ N , ai ∈ Σ, and 〈ni, ai, ni+1〉 ∈ E for 0 ≤ i ≤ k. We call the sequence of edge labels Σ∗ of...
متن کاملUPSP: Unique Predicate-based Source Selection for SPARQL Endpoint Federation
Efficient source selection is one of the most important optimization steps in federated SPARQL query processing as it leads to more efficient query execution plan generation. An over-estimation of the data sources will generate extra network traffic by retrieving irrelevant intermediate results. Such intermediate results will be excluded after performing joins between triple patterns. Consequen...
متن کاملA Tool for Efficiently Processing SPARQL Queries on RDF Quads
We present a tool called RIQ (RDF Indexing on Quads) for efficiently processing SPARQL queries on large RDF datasets containing quads. RIQ’s novel design includes: (a) a vector representation of RDF graphs for efficient indexing, (b) a filtering index for efficiently organizing similar RDF graphs, and (c) a decrease-and-conquer strategy for efficient query processing using the filtering index t...
متن کاملSubstring Filtering for Low-Cost Linked Data Interfaces
Recently, Triple Pattern Fragments (tpfs) were introduced as a low-cost server-side interface when high numbers of clients need to evaluate sparql queries. Scalability is achieved by moving part of the query execution to the client, at the cost of elevated query times. Since the tpf interface purposely does not support complex constructs such as sparql filters, queries that use them need to be ...
متن کاملMultidimensional Interfaces for Selecting Data within Ordinal Ranges
Linked Data interfaces exist in many flavours, as evidenced by subject pages, sparql endpoints, triple pattern interfaces, and data dumps. These interfaces are mostly used to retrieve parts of a complete dataset, such parts can for example be defined by ranges in one or more dimensions. Filtering Linked Data by dimensions such as time range, geospatial area, or genomic location, requires the lo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011